Efficient Floating-point Based Block LU Decomposition on FPGAs
نویسندگان
چکیده
In this paper, we propose an architecture for floatingpoint based LU decomposition for large-sized matrices. Our proposed architecture is based on the well known concept of blocking and uses pipelined floating-point units to obtain high throughput. We first analyze the effects of block size and the deeply pipelined floating-point units on the performance of the architecture. We analyze and compare the performance of our double-precision based design with that of a GPP based design. Initial results show that an improvement of upto 23x in the total computation time can be achieved. We then, analyze the impact of algorithm level design (by varying block size) on the system-wide energy dissipation and resource-usage of our designs. Categories: 1. Theory, Mapping and Parallelization and 4. Applications
منابع مشابه
Multi-FPGA based High Performance LU Decomposition
LU Decomposition is a linear algebra routine that is used to bring down the complexity of solving a system of linear equations with multiple RHS. Its application can be found in computational physics (modeling 2-D structures), image processing, and computational chemistry (design and analysis of molecular structures). This paper investigates the hardware software co-design of large scale block-...
متن کاملPerspectives for the Use of Field Programmable Gate Arrays for Finite Element Computations
We have studied how the solution of partial differential equations by means of finite element methods could be accelerated using Field Programmable Gate Arrays (FPGAs). First, we discuss in general the capabilities of current FPGA technology for floating-point implementations of number crunching. Based on practical results for basic floating-point operators performance limits are outlined. Then...
متن کاملImplementation of LU Decomposition and QR Decomposition on Parallel Processing Systems
One of the earliest attempts to implement LU Decomposition with special purpose hardware was using systolic/wavefront arrays[2]. Different proposals for the processing elements(PEs) of systolic/wavefront arrays are provided[3][4][5]. These ideas were not implemented in circuit at that time. The performance of these architectures were not quantitatively evaluated either. In 1994, E. Casseau[6] i...
متن کاملVariable Precision Floating-Point Divide and Square Root for Efficient FPGA Implementation of Image and Signal Processing Algorithms
Field Programmable Gate Arrays (FPGAs) are frequently used to accelerate signal and image processing algorithms due to their flexibility, relatively low cost, high performance and fast time to market. For those applications where the data has large dynamic range, floating-point arithmetic is desirable due to the inherent limitations of fixed-point arithmetic. Moreover, optimal reconfigurable ha...
متن کاملExploiting mixed-mode parallelism for matrix operations on the HERA architecture through reconfiguration
Recent advances in multi-million-gate platform FPGAs have made it possible to design and implement complex parallel systems on a programmable chip (PSOPCs) that also incorporate hardware floating-point units (FPUs). These options take advantage of resource reconfiguration. In contrast to the majority of the FPGA community that still employs reconfigurable logic to develop algorithm-specific cir...
متن کامل